Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources
نویسندگان
چکیده
An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org.
منابع مشابه
Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities
MOTIVATION Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques, such as Chromatin Immunoprecipitation (ChIP) provide important information about the architecture of the regulatory networks in the cell. However, it is very ...
متن کاملNestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence
NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find opt...
متن کاملProbabilistic Inference for Nucleosome Positioning with MNase-Based or Sonicated Short-Read Data
We describe a model-based method, PING, for predicting nucleosome positions in MNase-Seq and MNase- or sonicated-ChIP-Seq data. PING compares favorably to NPS and TemplateFilter in scalability, accuracy and robustness to low read density. To demonstrate that PING predictions from widely available sonicated data can have sufficient spatial resolution to be to be useful for biological inference, ...
متن کاملPICS: probabilistic inference for ChIP-seq.
ChIP-seq combines chromatin immunoprecipitation with massively parallel short-read sequencing. While it can profile genome-wide in vivo transcription factor-DNA association with higher sensitivity, specificity, and spatial resolution than ChIP-chip, it poses new challenges for statistical analysis that derive from the complexity of the biological systems characterized and from variability and b...
متن کاملComputational Models of Gene Regulatory Sequences and Responses
The regulation and responses of genes involve complex systems of relationships between genes, proteins, DNA, and a host of other molecules that are involved in every aspect of cellular activity. I present algorithms that learn expressive computational models of cis-regulatory modules (CRMs) and gene-regulatory networks. These models are expressive because they are able to represent key aspects ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PLoS ONE
دوره 3 شماره
صفحات -
تاریخ انتشار 2008